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ABSTRACT 


A non-technical discussion and the general technical formulation 
of a statistical decision problem are given. Following this, statistical 
decision theory is used to solve a testing problem concerning a proto- 
type midget submarine. A set of rules to be followed in conducting the 
testing and reaching an optimum decision as to whether to accept the 
midget is developed. The development proceeds according to the 
Bayes solution of a statistical decision problem in which the stochastic 
variables are independently and identically distributed and limited to 
take only two values. Finally, brief discussions of the assumptions 
and restrictions of statistical decision theory and the role of the Mini- 


max solution are included, 
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PREF ACE 


Operations research in the Navy is concerned with the establish- 
ment of quantitative basis for command decision, To help achieve this, 
the naval operations analyst e constantly seeking more useful tools, 
One such tool is the new theory of statistical decision functions, which, 
though presently unexploited in application, holds promise do extensive 
future use. 

The theory of statistical decision functions was developed in the 
decade prior to 1950 by the late Abraham Wald. The development cul- 
minated in the publication of his definitive book Statistical Decision 
Functions , The book was written for mathematicians, and is too 
cryptic for the reader of limited mathematical background, This fact, 
along with the writer's belief that statistical decision theory can be of 
practical value to the naval operations analyst, prompted the present 
thesis. The thesis is intended as an introduction to the subject, and, 
except for Chapter I which is non-mathematical, is directed toward 
the reader who has studied calculus and has completed an elementary 
course in probability and statistics, The thesis purports to do no more 
than present the most essential elements of statistical decision theory 
and the detailed solution of a simple special case. The reader inter- 
ested in a more mature treatment is referred to Wald. 

Source material for the paper has consisted primarily of Wald's 
book and notes taken by the author during a course of instruction in 
statistical decision functions given by Professor Thomas E, Oberbeck 


at the United States Naval Postgraduate School, The contents are 
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arranged in five chapters and an appendix. Chapter lisa non-technical 
discussion of a type of practical problem that may be solved by statisti- 
cal decision theory. The technical treatment begins in Chapter II, where 
the general formulation of the Bayes solution of the statistical decision 
problem is presented. Chapter III introduces certain assumptions 
needed to apply the theory, and Chapter IV treats an elementary special 
case. Chapter V deals with the Minimax solution. The Appendix gives 
a review of some selected mathematical concepts needed to understand 
better the technical discussions. It is recommended that the reader 
study the Appendix before beginning Chapter II, 

The thesis was written during the period January - June , 1955 
at the U. S. Naval Postgraduate School, Monterey, California. I wish 
to express my gratitude to the Navy for affording me the opportunity to 
write the thesis, to Professor Thomas E. Oberbeck for the technical 
competence and contagious enthusiasm he brought to his task as faculty 
advisor, to Professor Walter Jennings for helpful suggestions made 
while serving as second reader, and to Mrs, D. P. Slingerland for 


painstaking clerical assistance. 
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CHAPTER I 


A NON-TECHNICAL DISCUSSION 


I. Introduction, 


In naval planning it is often necessary to predict the future use- 
fulness of a proposed weapon or tactic. To do this, some value asso- 
ctated with the weapon or tactic, such as percentage success, average 
missed distance, or average life, is selected as a measure of the use- 
fulness of the weapon or tactic. The problem then becomes one of es- 
timating what this value, which we shall refer to as a parameter value, 
would be in a future war. 

The usual procedure for doing this is to conduct some trials. An 
estimate of what the parameter value would be ina future war is obtain- 
ed asa result of these trials. The important thing to note is that the 
estimate is not guaranteed to be correct. We intuitively suspect that 
the accuracy of the estimate increases as the number of trials conduct- 
ed increases. Hence, the number of trials to be conducted is of funda- 
mental importance. 

The question of how many trials to conduct is often decided arbi- 
trarily. Again, if the services of a statistician are available, the 
naval planner may determine the number of trials required to give,on 
the average, an arbitrarily specified degree of confidence in the esti- 
mate. In either case, some arbitrariness is retained. 

Statistical decision theory adds a refinement to this procedure. 

It employs a criterion based on probability theory to select an opti- 


mum number of trials, The process involves a sort of cost analysis 
p 








of the problem. In practical situations the cost of conducting trials 

will usually be significant, and a definite cogt may be associated with 

a poor estimate of the parameter value. To avoid the cost of the trials 

the planner is led to conduct no trials, or only a few; to avoid the cost 

of a poor estimate he is led to conduct a great number of trials. Obvi- 

ously, the two considerations are opposed, The purpose of statistical 

decision theory is to reconcile these two opposing considerations, and, 

by the use of the criterion, to arrive at an optimum plan concerning 

the number of trials to be conducted and the final decision to be reached. 

Let us consider an example to see what this means, 

z Exhibit A, : 
Suppose the Navy is interested in a newly developed midget sub- 

marine to be launched from a mother submarine and used to kill enemy 

submarines. The question of detection is not under consideration, but 

merely the capability of the midget to effect kills. It has been decided 

that the device should be tested. Budgetary considerations, consider- 

atgons of priority of the services of the testing agency, etc. dictate the 

necessity of answering the question: How many trials are likely to be 

conducted? The question may be answered by using statistical deci- 

sion theory. But before giving the answer, it is necessary to establish 

some precepts to be used in reaching it. The technical meaning of 

these precepts will be seen later, when we discuss each as a datum of 

the statistical decision problem. For the moment, let us think of them 

merely as the ingredients of a recipe. They must be put into the problem 

if we are to obtain a solution, 


The example is entirely fictitious (hence unclassified), and has been 
chosen merely for illustration, D 
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There are five precepts, and they are straightforward. First, we 
decide to classify each trial of the midget submarine as a success ora 
failure, accordingly as the midget succeeds or fails to achieve a kill on 
the trial, Then, in our problem, the percentage success of the midget 
submarine in a future war is the unknown parameter value described in 
the Introduction, Second, we must say something about the relative 
Izrkelihood of the various possible parameter values, i.e., the possible 
values of the percentage success of the midget submarine in a future 
war. Since we have no knowledge to the contrary, we assume that all 
values in the range 0 to 100% are equally likely to occur. Third, we 
decide to accept the midget if it succeeds on fifty percent or more of 
its trials, and to reject it if it does not, This decision might be based, 
for example, upon an assumption that present anti-submarine attack 
miethods will succeed from twenty-five to seventy-five percent of the 
time in a future war. Fourth, we assume that the cost of each trial 
will be the same, and a study of the tactical situation, forces involved, 
etc. fixes the amount at $4000 per trial. Fifth, we have to establish 
the cost of a wrong decision as to whether the midget is superior to 
present anti-submarine attack methods. We may do this by making 
a careful study. The study might consider such things as the cost of 
producing the midget, the number that would be produced, the cost 
of alternative weapons, etc., and it leads us to an estimate of the 
cost of a wrong decision as shown in the following table: 


(See following page for table), 








Decision Percentage success of Cost in 
midget in future war Dollars 


Accept midget (trial more than 25% 
successes greater 
than 50%) 


Accept midget (trial less than 25% 1,000,000, 
successes greater 
than 50%) 


Refect midget (trial more than 75% 1,000, 000, 
successes less 
than 50%) 


Reject midget (trial less than 75% 
successes less 
than 50%) 





Cost of Decision 
Table 1. 
Note that the first and last lines of the table represent correct deci- 
sions and cost nothing, whereas the second and third lines represent 
wrong decistons and cost a definite amount. 

Let us elaborate upon the nature of these ''costs'' of wrong deci- 
sion, They are not actual amounts of money that must be paid to 
someone, Rather, they may be explained as follows: If a wrong de- 
sion is made, a certain disadvantage accrues to the Navy as a result, 
The money evaluation of this disadvantage is called the cost of wrong 
decision, This is much like the ''cost'" to a salesman who loses a 
$300 commission because he elects to play golf instead of seeing a 
prospective customer, He doesn't have to pay anyone the $300, but 
he is nonetheless $300 worse off than he might have been. We say 


he has made a "costly decision", In short, the cost of wrong decision 








is the money equivalent of the loss suffered as a result of the wrong 
decision, 

In considering the costs shown in Table 1 it is necessary to keep 
in mind the distinction between the percentage success observed in the 
trials and the percentage success the midget submarine would have in 
a future war. The first is an estimate of the second, and is not neces-~ 
sarily correct. Notethat. if the midget is definitely superior to pre- 
sent anti-submarine attack methods (i.e. , would have a percentage 
success in a future war in excess of 75%) and we reject it, we must 
penalize ourselves $1,000,000. This is the cost shown in the third 
line of the table. Similarly, if the midget is definitely inferior to 
present anti-submarine attack methods (would have a percentage 
success in a future war of less than 25%) and we accept it, we must 
again penalize ourselves $1,000,000. This is the cost shown in the 
second line of the table. Onthe other hand, if the percentage success 
of the midget in a future —_ is between 25% and 75%, we do not need 
to penalize ourselves for either decision, This is not unreasonable. 
in view of our earlier assumption that present anti-submarine attack 
methods have a percentage success of 25% to 75%. For, if the mid- 
get would have a percentage success in the same range, we shall 
consider that we have really neither gained nor lost by either accept- 
ing Or rejecting it, 

The solution to the problem can now be given, It takes the form 
of atable (Table 2), The question of how such a table is obtained is, 
essentially, the subject of this paper, The actual detailed procedure 


for obtaining this particular table is presented in Chapter IV, For 








the moment, let us accept the table. We can then examine and inter- 
pret it, so that we may gain an appreciation of the role of statistical 
decision theory. 

(See following page for Table 2.) 

Let us note the construction of the Table. The numbers identi- 
fying the rows and columns designate, respectively, the number of 
failures and successes of the midget submarine that have been obser- 
ved in successive trials. Any set of one row designator and one col- 
umn designator locates a square in the table. This square then applies 
to the situation existing after the indicated number of failures and suc- 
cesses have been observed. For example, the square in row number 
three and column number five applies. after three failures and five 
successes have been observed. Each square contains two numbers 
(of dollars). They have the following meanings: 


upper number: the anticipated cost (in dollars) to the Navy 

: if no further trials are conducted, anda 
decision is made to accept or reject the 
midget on the basis of the trials conducted 


thus far. 


lower number: the anticipated cost (in dollars) to the Navy 
if trials are continued, and a final decision 
to accept or reject the midget is based on 


the results of further trials. 

The choice of the words "anticipated cost"' in defining these 
two numbers has been carefully made. This is because the dollar 
values represented by these numbers in the table are "expected 
values" in the sense of probability theory. This is discussed in 


the Appendix. What the reader must understand at this point is 
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that the numbers are not absolute like the $4000 cost per trial and the 
$1,000,000 cost of wrong decision. Rather, they are values calculated 
on the basis of the likelihoods of occurrence of the possible outcomes, 
much like insurance companies calculate the life expectancy of man 
from the relative frequency of deaths at each age. 

The anticipated costs shown in Table 2 constitute the criterion 
used to determine an optimum solution to the problem. Hence, the 
solution is optimum relative to these costs as acriterion, Since 
the nature of the criterion is probabilistic, the final decision, also,is 
probabilistic. What this means, in practical terms,is that, if a rare 
and unlikely series of results-is' obtained on the conducted trials, such 
as success on every trial when actual future wartime employment will 
yield a preponderance of failures, a poor decision will be made. This 
is a chance that must be taken to avoid the great cost that would cer- 
tainly occur if a very large number of trials were conducted. It does 
not invalidate the theory any more than the survival of one individual 
to age 106 imwelidavet the methods of insurance companies. 

We may now proceed with the interpretation of the table. 

Notice that the upper number is greater than the lower number in some 
of the squares, and equal to it in others. Those in which it is greater 
are enclosed within the double lines. At any stage of testing: corres- 
poj@ing to one of these enclosed squares, the anticipated cost is less 
to continue taking trials than it is to reach a decision at that time. On 
the other hand, at any stage of testing corresponding to a square out- 
side the double lines, the anticipated cost is as little if trials are 


halted, and a decision to accept or reject the midget submarine is made 





ae amid 











e 








eer ah. - i 









on the basis of the sample already taken. 

Now observe that the (0,0) position is within the double lines. 
This means that the initial anticipated cost is least if some trials are 
conducted, The number that will be conducted depends on the outcome 
of the trials. We begin in the (0,0) position, and conduct atrial. If 
it succeeds, we move right to the (0,1) position; if it fails, we move 
down to the (1,0) position, In either case, the second position is still 
within the double lines, so another trial is conducted. This process 
is continued until a position outside the double lines is reached. This 
may require anywhere from three to 13 trials. For example, if the 
first three trials all succeed, position (0,3) will be reached, Here, 
the upper entry ceases to be greater than the lower entry, so it will 
pay to stop taking trials and decide, since the percentage success of 
the trials conducted of 100% is greater than 50%, to accept the midget. 
As another example, suppose the trial outcomes alternate from suc- 
cess to failure to success, etc., inthat order. This will result in 
a stair stepping down the table, returning to the main diagonal 
(number of successes equal number of failures) on alternate trials. 
Eventually we must arrive outside the double lines in position (6, 7) 
after 13 trials. The percentage success for the trials conducted is 
then 


x 100 = 53.8%, 


and again the decision is made to accept the midget. If the sequen- 
ce of outcomes leads to a position outside the double lines on the 


upper side of the main diagonal, the percentage success of the trials 








conducted will be greater than 50%, and the midget will be accepted; if 
it leads to a position outside the double lines on the lower side of the main 
diagonal, the percentage success of the trials conducted will be less than 
50%, and the midget will be rejected. 

With the aid of Table 2. it is now possible to answer the 
earlier question of how many trials are likely to be conducted. The 
answer consists of Table 2. and the following rule: 


Begin conducting trials, and. following each trial, 
note the position reached in phe table. Continue 
this until a position outside the double lines is 
reached, then accept the midget if the number of 
successes exceeds the number of failures. Reject 
the midget if the number of failures exceeds the 
number of successes. The minimum number of 
trials required to reach a final decision will be 


three; the maximum number will be 13. 
oa Another Aspect. 

A direct solution of the problem has been given, Let us now 
consider a possible budgetary complication, Suppose that $32,000 
has been alloted to conduct the testing of the midget submarine, This 
is, of course, an illogical amount in the light of statistical decision 
theory, The solution does not divulge exactly how much the testing 
will cost. It predicts only that from three to 13 trials will be required. 
At $4000 per trial, this amounts to a cost of from $12,000 to $52,000. 
The dilemna can only be resolved, in the light of statistical decision 
theory, by getting the allotment changed to permit the. flexibility 
required by the solution, Failing this, an optimal decision may be 


reached, but cannot be guaranteed. If it turns out that a position 


10 








outside the double lines, such as (5,3), is reached within eight trials, an 
optimum decision will be reached in spite of the limitation. On the other 
hand, if we are still inside the double lines after eight trials, such as in 
position (4,4), sufficient data to indicate an optimum decision has not been 
collected. 

A variation of the budget problem is the case in which more than 
$52,000 is available for the testing. In such a case, expenditure be- 
yond $52,000 is, according to statistical decision theory, a waste of 
funds, The solution will have indicated the optimum decision after 13 
trials, if not before, and additional trials are not called for by the 
theory. 

4, Summary. 

Exhibit A has been studied to help provide a conceptual under- 
standing of what is involved in the type of solution of the testing problem 
provided by statistical decision theory, It should be remembered that 
the precepts, i.e., the decision to classify each trial as a success or 
a failure, the decision to either accept or reject the midget, the speci- 
fication of the cost of testing, the specification of the cost of wrong de- 
cision, and the specification of the likelihood of various values of the 
percentage success ina future war, are necessary inputs to the pro- 
blem, Finally, the solution that is obtained is optimum relative to 
the anticipated costs as the criterion, and the final decision is proba- 


bilistic in nature, 
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CHAPTER II 


GENERAL FORMULATION OF THE BAYES SOLUTION 


le, Basis of the Problem, 

A datum of any problem is defined to be something, actual or 
assumed, that is used as a basis for reckoning. The statistical de- 
cision problem has five of these. In Exhibit A we considered them 
intuitively as precepts. Lat us now examine them in more technical 
detail, and introduce a portion of the notation of statistical decision 
theory. 

a. Stochastic Process X: A 3tochastic process is defined as a 
countable collection of stochastic (chance) variables having a joint 
cumulative probability distribution. To explore this, let us think 


of a countable collection of stochastic variables 
x ={X = {XX Xe... ee fe 
Let us next think of a countable set of real values, one for each 


stochastic variable, i.e., 


w= [ep ={ 4). %, By -------j- 


By definition (see Appendix), the joint cumulative probability distri- 
bution of all the stochastic variables in the countable collection is 
the probability that xX, < x, simultaneously for all i. In other 
words, F(x) is the probability that X, < x,, X, <x, ; x, 4 Xa» 
/_—— simultaneously. 


An important special case of a stochastic process should be 


mentioned, It is the case where the stochastic variables x, ‘ X, ; 
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a i are independently and identically distributed. .The condition 
of independence means that the joint distribution function is the product 


of the individual distribution functions. In this case, 


{a @) 


F(x) = G, (x) G, (x2) G3 (x3)... .. = FY) G, (x), 


where G, (x,) is the distribution function of the i ae stochastic var- 
iable. The condition that the stochastic variables be identically dis- 
tributed means that the distribution of each stochastic variable has, 
not only the same form (such as normal or uniform), but also the 


same parameter values, Thus, we might have 
G, (x, ). = G (x55 J») for all i 


where G (x5)A »>—) denotes a normal distribution with mean “A and 


standard deviation @ . Inthis case, we may write 


ce 
F(x) 2 F(xya.c) = U7 Gx, 


eT ) 
showing the dependence of F upon the values of the parameters, 
and 97. 

The stochastic process of Exhibit A is an example of one in 
which the stochastic variables, x; » are assumed to be indepen- 
dently and identically distributed. The outcome of each trial of the 
midget submarine is considered to be a stochastic variable. Hence, 
the result of the i po trial constitutes the stochastic variable x. , 
The possible particular outcomes of each trial, success or failure, 
are thought of as representing particular values of the stochastic 


variables. The assumption that every trial has the same chance of 


succeeding as every other trial is equivalent to the assumption that 
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the stochastic variables are identically distributed. The two values to 
which the stochastic variables are restricted (failure and success) are 
denoted by 0 and 1 respectively. The common percentage success 
of each stochastic variable, thought of as a parameter, is labeled p 
(parameter value). This makes it possible to depict the stochastic 
process diagramatically by showing the distribution, G (x;p) , of one 
of the identically distributed stochastic variables as in Figure l. 


g(x) G(X5P) 
1 





O d — 1 
Bar Graph Step Function 


Distribution of One of the Stochastic Variables of Exhibit A 
Figure l, 


In this case, we may write 


F (x) = F (x;p) = Tr G (x,5P) 
showing the dependence of F upon the parameter p. 
b, Space /L : The space J. is defined to be a class of joint cumu- 
lative probability distribution functions known to contain the true dis- 
tribution, F (x), of the stochastic process, The elements of SL 
are joint cumulative probability distribution functions and differ from 


one another only in the values of their parameters. Hence, F (x;p,) ; 


F (xipo),.....5 F (x:p_),.. . are elements of the space ae 


14 








when F depends ona single parameter. For this reason, it is often 
convenient, as well as illuminating, to think of JL as a parameter space. 
Adopting this view in subsequent portions of this paper, we shall refer 
to elements of the space + as values of this parameter. The para- 
meter is then regarded as a stochastic variable, FP, and, as such, 
is liable to take on different values with different likelihoods, Note 
that, following convention, we denote the parameter in its role asa 
stochastic variable by using the capital letter P , while parameter 
values are denoted by the small p . In short, -/\ is a class of simi- 
lar joint cumulative probability distribution functions having different 
parameter values and known to contain, as an element, the particular 
joint cumulative probability distribution function having the correct 
parameter value, or, as we have referred to it above, the true F . 
To determine an optimum way of estimating this parameter value is 
the crux of the statistical decision problem. 

In Exhibit A, the percentage success of the midget submarine 
in a future war is the particular parameter value of interest. If 
we knew it, there would be no problem. Since we do not, we regard 
the unknown parameter as a stochastic variable, P . We know only 
that this stochastic variable is confined to range between 0 and 
100%. It was stated, as a precept, that prior to any experimenta- 
tion we would assume the true parameter value to be anywhere in 
this range with equal likelihood. This is equivalent to saying that 
the stochastic variable, P, is continous and that its a priori dis- 
tribution is uniform. The uniform probability density function of 


/ 
P, ©€(p), and the associated cumulative probability distribution 
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function, §(p) , are shown in Figure 2. 


=(p) & (p) 
1 1 }--------—-- 





twh-—-—-—-- -—----— 


oO 1 P O 


A Priori Distribution of the Parameter of Exhibit A 
Figure 2, 
Note that p represents a possible value of the stochastic variable 


P , and & (p) represents the probability that P < p. 


Gi. Space p! : The space pD’ is defined to be the space of possi- 
ble final decisions, To illustrate p' , let us again refer to Exhibit 
A. We recall that, at any stage of experimentation, we were always 
faced with two alternative types of decisions, namely, to make a 
final decision or to continue experimenting. These two types of 
decisions are distinguished by defining two classes of. decisions: 
p' : the class of all terminal decisions 
D®: the class of all decisions to continue experimenting, 
such as take one more trial or take two more stages 
of three trials each, etc. 


Now, in Exhibit A, pe consisted of two elements: 


d,: accept the midget. 


p— oF 


ds : reject the midget. 


D© consisted of a single element: 
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dy : take one more trial. 


In general, p' and D®° are not so restricted, but may consist of as 
many elements as needed to cover all possible decisions. This idea 
is expressed symbolically by: 

t 


p° is a class consisting of dy , d, uiclaies © eure 


e€ 


D® is aclass consisting of di ; ds se) saemeone 


To illustrate the relation between D’ and D®° , it is convenient to 
define the class D as the class of all possible decisions. It is then 


clear that D = p'U D® . This is shown pictorially in Figure 3. 


D 


Decision Space 
Figure 3 
It will be recognized that the sum total of all decisions from Di and 


D© are exhaustive and mutually exclusive. 


d. Weight Function W (p,d°) : The weight function is defined to 


be a non-negative function, the value of which expresses the cost of 
making the terminal decision at when the true parameter value is 
p. Itis fioeeit the weight function that the cost of making a wrong 
decision is introduced into the problem. If a correct decision is 
made, the value of W (p, a’) will be zero; if an incorrect decision 


is made, W (p, a’) may have a positive value. In general, the 
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weight function, like any function of two variables, may be depicted as 


a surface as shown in Figure 4, 





Weight Function ( General ) 


Figure 4. 
In the special case of Exhibit A, this surface degenerates into two 


curves in space as follows: 


w(p4*) 


# 1,000,000 


AA fs 
pe ul 






(a Gaz 
iva TM 


Weight Function for Exhibit A 





Figure 5, 
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Since there are only two elements in p* for Exhibit A, it becomes more 
instructive to represent Figure 5 as two curves as shown in Figure 6. 


W (pid) w (pide) 


4,000,000 # 1 000,000 





Oo 25 +50 075 1,00 ie 


Alternative Representation of Weight Function for Exhibit A 
Figure 6. 
Either Figure 5 or Figure 6 is the equivalent of Table l. 
The weight function is the most difficult datum of the statisti- 
cal decision problem to specify. Since it is a datum, it must be 
ikhown before a statistical decision problem can be solved. The 
operations analyst must be able to specify its value for any vhlues 
of the arguments p and at . This amounts to saying that he must 
be able to assign a numerical cost to any combination of a possible 
terminal decision and a possible parameter value. The question of 
how to do this is one that needs extensive investigation, and offers 
an opportunity for further study. 
It is often possible and desirable to classify decisions as 
merely right or wrong. In such case, the weight function, W (p, a’) ; 


takes on only values 1 and 0 , and is said to be a simple weight 
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function, Except for a scaling factor of 10° , this was the case in 


Exhibit A, 


e. Cost Function C (x,s): The cost function is defined to be a non- 
negative function expressing the cost of experimentation. In general, 
it depends on the values, x = % + Xo: ++ - » Obtained on the observa-~ 
tions, It also depends on the variables observed in each stage of ex- 
perimentation, and the number of stages, s = S)2 Soe - ee Ss 
observed. However, it may be possible, and is usually desirable, 

to consider the special case in which the cost of experimentation is 

the same for each experiment. Then the total cost of experimenta- 
tion is proportional to the number of trials conducted. This was the 
case in Exhibit A where each observation cost $4000, and the cost 


function had a value of1000 times the number of observations taken, 


oye The Statistical Decision Function, 6 (x,s). 

A statistical decision function, ra » is a set of rules’ which 
estimates a parameter using the results of observations of a sto- 
chastic process X,. It depends on the values x = Xs Xe e 
obtained on the observations and on the variables observed in each 
stage of experimentation as well as the number of stages, s = Si 
Sos + oS, It is a function which prescribes a plan for conducting 
experimentation and reaching a terminal decision, For example, 
in Exhibit A the statistical decision function consisted of the Table 
2y from which instructions for experimenting and reaching a termin- 
al decision were obtained. The problem of statistical decision theory 


is, given the stochastic process X , the space /L , the space De , 


20 





the weight function W (p, a’) and the cost function C (x,s) , to find the 
statistical decision function that provides the optimum decision, 
3. The Risk Function, r(p,d). 

Each statistical decision function § is an element of the class cf 
of all statistical decision functions. To select that gf from & which 
provides the optimum solution to a statistical decision problem, a 
criterion is needed. That is the role of the risk function, We have 
already seen, from Exhibit A, that the criterion must take account 
of the conflicting costs of experimentation and wrong decision, To 
introduce these costs more precisely into the risk function, let us 


define 
Fy (p,dé): the —— cost of decision [ expected value of 
W (p,d') ] when p is true and d is used. 


T> (p,d) : the expected cost of experimentation [ expected 
value of C (x,s) ] when p istrue and J is 


used, 

Note that ry (p,d5) and Tr (p,é) are both expected values. The 
meaning of ''expected value" has been discussed briefly in connect- 
ion with the anticipated costs of Exhibit A; it is explained more tech- 
nically in the Appendix, Now, ry (p,d) and Tr, (p,d) are, respec- 
tively, the expected (average) values of W (p, a’) and C (x,s) for 
given values of p and gd. Thatis, W (p, a’) is averaged by the 

probability that d’ will be made to give r, (p, J) , and C (x,5) 
is averaged by the probability that the values x will be obtained 
when the stages, s = 8S,» Sr++ +--+ S,» are observed to give 
r5 (p,d) . The notion that these averages are obtained for a parti- 


cular (p,d) should be kept clearly in mind, for we shall subsequently 
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require expected values calculated with respect to the variables p and J. 
The risk function may now be defined to be the sum of the expected values 
of the weight function and the cost function for given values of p and d. 


That is, 
r(p,d) = x, (pd) + 4, (p.d). 


Hence, the risk function, which may take on a value for any pair of ar- 
guments (p, 6) , represents the total expected cost associated with these 


arguments, 


4, The Bayes Solution, 

The goal of statistical decision theory is to select the particular 
statistical decision function, o,, that prescribes the optimum plan con- 
cerning the number of trials to be conducted and the optimum terminal de- 
cision based on the results of these trials. The risk function is the basic « 
criterion to be used in making this selection, But the risk function, as we 
have seen, depends on both p and od for its value, The dependence on p 
makes it unsuitable, in its present form, as a yardstick for comparing the 
refative merits of various J. To overcome this difficulty, we need to re- 
move the dependency on p. This is accomplished by averaging out the p, 
leaving a new function, the average risk, which depends on db alone for its 
value. The values of the new function may be ordered as to magnitude, 
and the magnitudes will vary with Salone. 

Let us elaborate on this. It often happens that a reasonable estimate 
of the likelihood of P taking on various values,.p, can be given at the out- 
set. That is, the physics of the problem, a study of past results, or even 
a shrewd analysis may provide an a priori distribution of P. This simply 


means that we are able to specify, either as an assumption or 
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as a reasonable approximation, some distribution function, §3(p), that 
describes the likelihood with which FP will take on the values within its 
range of possible values. From this point on, S is assumed to be 
known. 7 If we now take the expected value of the risk function with 


respect to the a priori distribution of P , we get,in Wald's notation, 


r ($d) = ben (p.d)d% 
Si. 

Notice that this average risk, averaged with respect to the a priori 
knowledge of P , depends only on $3 and J >» and S is known. 
This is a significant result. It means that the average risk is de 
as the yardstick for comparing J, since it can be ordered as to mag- 
nitude, and the magnitudes depend only on So . Our interest, of 
course, is in selecting a particular or that makes the average risk 


the least. That is, we want a Jd, such that 


r({3,d) = Min r (3d). 


This is often alternatively expressed as 
r (5, J.) <r(3,¢) for alld. 


Such a J, constitutes a Bayes solution. Thus, a Bayes solution is a 
do which minimizes the average risk, r(3,c) ,» with respect to 
all ob . It is to be noted that a Bayes solution is a solution relative 
to a particular a priori distribution & . The procedure employed 
to arrive at a Bayes solution is summarized in the block diagram of 


Figure 7, 


The case where % cannot be specified is discussed in Chapter V. 
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Figure 7, 
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CHAPTER III 


ASSUMPTIONS OF STATISTICAL DECISION THEORY 


1 An Assumption Concerning Each Datum. 

This chapter introduces some assumptions applied to the theory 
of statistical decision functions to insure that solutions exist. <A 
complete study of the implications of these assumptions is not at- 
tempted. Rather, the assumptions are briefly presented here merely 
to acquaint the reader with the nature of the problem, so that he may 
gain some insight into the character of the Sac re imposed by 
the assumptions. A full treatment is given by Wald. One assump- 
tion regarding each datum of the statistical decision problem is re- 
quired. 

a. ' Assumption 1° The assumption regarding the stochastic process 
X is stated only for the case where the x, are independently and 
identically distributed. In this case, it is assumed that the stochas- 
tic process, X , is discrete or absolutely continuous. That is, 
either each component stochastic variable is discrete, or it is con- 
tinuous and has a density function. Continuous stochastic variables 
without density functions are not admitted. 

Bb. ° Assumption 2: A convergence property regarding the space SL 
is required. However, it is not necessary to explore the nature of 
this’ property for our purposes, since Wald shows that it is a conse- 
quence of Assumption 1. As such, it constitutes no additional limi- 
tation. 


ro Assumption 3: The weight function, W (p, a’) , is a bounded 


Zs 





os — 
nf ag = 
—— _ 

i 
——aee 





function of p and at . Recalling that the weight function was defined 
to be a non-negative function which describes the cost of making any 


particular terminal decision, a‘ , we wee that this assumption merely 


excludes the possibility of any decision costing an infinite amount, 


Gi, Assumption 4: The space De is compact in the sense of the 
metric 

, R (ae a‘) = Sup | W(p a ) -8W { de) 

3 l s 2 p 9 1 P> 2 S 


This assumption is fulfilled if the space D’ is finite. That is, if the 
tramber of terminal decisions which may be made is finite, the assump- 
tion is satisfied. This will cover most cases, However, if D’ is not 
finite, the assumption can generally be satisfied by restricting the 
range of the unknown parameter to a bounded space, This restric- 
tion appears to present no practical difficulty. 
e. Assumption 5: The cost function, C (x,s) , satisfies the 
following three conditions: 
(1) C (x,s) = O forall x and s_ , and C (x35, avbege 8141) 2 
C (x38, + ae 81). 
(2) Forany given s_ , the cost, C (x,s), is eithera 
bounded function of x or G({x,sj = @© identically in x . 
(3) There exists a sequence, [ oo ls (m=1,2,..., ad, inf, ) 
of positive values such that 


Lim c = © , «nd 
m 


m = © 
C (x,s) > c_, for all x , and forall s = [ 8): rae ,8,] 
for which the set theoretical sum of Spree o 18h contains 


at least m elements, 


as 








The meaning of this assumption concerning the cost of experimentation 
is given in words as follows: 


(1) The cost of experimentation cannot be negative, and the total 
cost of experimentation after an additional stage is taken can- 


not be less than it was before, 


(2) The cost of experimentation is either finite or it is impossible 


to make observations of certain variables. 


(3) Regardless of the values of the observations made or the 
number of stages employed in making them, if the total 
number of observations is at least m, then the cost, 

C (x,s), of these observations is not less than the mee 

term in some increasing sequence, Cn? which approaches 

infinity as a limit. The basic idea of this is that there 

exists some minimum value of the cost of observing m 

variables beyond which it is impossible to reduce the cost 

of observing m variables by rearranging the composi- 

tion of the stages of experimentation, In other words, 

it is not possible to observe more variables for less 


a money by taking the stages wholesale, 


Ze An Assumption Concerning the Space a . 

An assumption concerning the space o& of admissible decision 
functions is made in addition to the assumptions concerning each da- 
tum, The most essential portion of the assumption is that only those 
decision functions which prescribe a finite amount of experimentation 
and which lead to a terminal decision are to be considered. 

3. Some Consequences of the Assumptions} 

Ryverdics's of how slight the cost of experimentation, if one ex- 

perimented an infinite amount: the cost would increase without bound, 


Therefore, there exists a point beyond which further experimentation 
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is not profitable, This intuitive notion is developed rigorously by Wald 
when he shows that, even though we limit ourselves to decision func- 
tions which prescribe a finite amount of experimentation, we can still 
approach an optimum solution arbitrarily closely under the assumptions 
of this chapter, 

Subject to the assumptions of this chapter, a Bayes solution 
exists for any given a priori distribution, S . If it is not practica- 
ble to specify ana priori distribution, then the decision problem may 
be viewed as a zero-sum, two person game in the sense of von Neu- 
mann's theory of games, and a minimax solution exists, A minimax 
solution is a Bayes solution relative to the least favorable a priori 
distribution, The minimax solution is discussed further in Chapter 


Vv. 
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CHAPTER IV 


THE BAYES SOLUTION FOR A SPECIAL CASE 


Li. General, 

The general formulation of the Bayes solution to the statistical 
decision problem was given in Chapter II, and some of the theory un- 
derlying its development was pointed out in Chapter III, In this chap- 
ter, we shall undertake a progressive restriction of the general problem 
until, ultimately, we arrive at the special case illustrated by Exhibit A. 
Thereupon, the detailed solution of Exhibit A will be indicated, The 
first step in this process will be to consider a statistical decision pro- 
blem in which the stochastic variables are restricted to be indepen- 
dently and identically distributed, and the cost of experimentation to 
be proportional to the number of observations, Then we shall proceed 
to the case where the stochastic variables are further restricted to 
take only two values, The discussion of the latter will terminate with 


the solution of Exhibit A, 


Ze Independently and Identically Distributed Stochastic Variables 
with Simple Cost, 

Recalling that the object of statistical decision theory is to 
find the "'best'' decision function, we may readily see how the restric- 
tions we are imposing will help us, By restricting the cost function 
to be simple, i.e., by requiring the cost of experimentation to be 
proportional to the number of observations, we make it possible to 
ignore the manner in which the observations are grouped or arranged, 


That is, we may consider only those decision functions for which 
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each stage of experimentation consists of exactly one observation, Fur- 
ther, by requiring the stochastic variables x, to be independently and 
identically distributed, we eliminate the need for concern as to which 
particular stochastic variables are observed. As a consequence, we 
may limit the decision functions considered to those which not only 
prescribe a single observation per stage, but also prescribe that the 
stochastic variables will be observed in order. This is possible be- 
cause the stochastic variables, being identical, may be ordered in 

any desired way. 

In continuing our search for a "best'' decision function, we may 
now assert that, in choosing it, we need only compare the merits of 
decision functions falling into the limited category explained in the 
preceding paragraph. And since we are seeking a Bayes solution, 
the decision function we ultimately select will be the one that is 
"best" in the sense of the Bayes solution of Chapter II, The teader 
will recall that the Bayes solution is given relative to ana priori 
distribution S(p) in (\, and that it consists of that decision func- 
tion, S. , which minimizes the average risk - the average being 
taken with respect to § and the minimum over all J . With these 
facts in mind, we may proceed with the process of comparing the 
average risk produced by each Jd , and the choice of the a 
which produces the least average risk, 

Let m bea non-negative integer, and let J’ denote a deci- 
sion function which guarantees that the total number of observations 
will not exceed m . Then, for anya priori distribution 5 , we 


may define 
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Pm (3) = Min, ett Gee) 
to be the least average risk that can be found by considering only deci- 
sion functions which guarantee no more than m observations, 
Similarly, 

P(s) = Min r (5,4) 
is the least average risk to be found by considering all decision func- 
tions, whether or not they prescribe a finite number of observations, 
A particular decision function that belongs to both classes JS and 

gf™ | which we will be interested in, is f° . This is the decision 

function which is guaranteed to prescribe no observations, It is of 


interest because it enables us to write 


Pol) = Mip x (8,d°) = Min W(5,d°) . 
This is an obvious, but important relation, It says simply that the 
least average risk, if we consider only decision functions which pre- 
scribe no experimentation, is equal to the minimum cost of decision, 
This follows from the definition of risk ( cost of experimentation 
plus cost of decision ) as given in Chapter II, and the fact that no ex- 
perimentation is involved. 

Two remarks at this point may assist the reader in avoiding 
misunderstanding, First, whereas cumulative distribution functions 
(suchas §) are usually employed in logical developments, the 
corresponding density functions ( such as e) are more often used in 
calculation, The distinction should be constantly remembered. 
Second, the present chapter requires Assumptions 1-5 of Chapter II, 


but does not require the assumption concerning JF. a fact the reader 


may have surmised from the introduction of Pi Sy 


31 








There are several theorems concerning the functions Tae (=) , 
p(s) and Po (%) which enable us to compare various average risks 
and lead us to the Bayes solution. Perhaps the most important of these 


is the recursion formula 


Pp (3) 


[ome = ™ DB) (a|8 
*} Pm df (a8) 


We need to examine this formula carefully and understand it thoroughly. 
It contains several symbols not given explicitly before. They are 


a: stands for a value that might be obtained if a stochas- 
tic variable were to be observed. When none is ob- 
served, but advance calculations are made with the 
thought in mind that one could be, then the symbol a 


may be thought of as a stochastic variable itself, 


£*(a|p): a cumulative distribution function for the stochastic 
variable a described above that would exist if p 
were the true parameter value of the joint cumula- 
tive distribution function F (x) 

f*(al€): the expected cumulative distribution function of a ob- 
tained by calculating the expected value of f*(a|p) : 
That is, £"(alp) is weighted by the a priori knowledge, 
S, of the distribution of p in SLto obtain the average. 


Cc: the cost of one observation 


e.: the a posteriori cumulative distribution function 
of FP in SL based upon the observation a . 
If & isan a priori distribution and a is the 
result of a single observation, then &, is an 
a posteriori distribution obtained by applying 


Bayes theorem (Appendix A) to modify S to 
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So - the modification being based upon the observation: a, 


Combining these notions, it is possible to paraphrase the recursion 


formula as follows: 


the least average risk = the minimum of: 


produced by decision (1) the least average risk produced by 
functions which pre- decision functions which prescribe no 
scribe from 0 to observation 

m + 1 observations (2) the cost of one observation plus the 


expected value of the least average 
risk produced by decision functions 
which prescribe from 0 to m 


observations after the first one 
This formula seems reasonable and its validity may be shown under 
the assumptions of Chapter III, If we want to know the least average 
risk to be had by allowing decision functions prescribing from 0 to 
m+1 observations, we can surely get at it by breaking the decision 
functions we are allowing into two groups and picking the minimum 
one of the two least average risks attainable from these two groups, 
If the breakdown is made into (1) decision functions prescribing no 
observation and (2) decision functions prescribing from 1 to m+ 1 
observations, we are set up to select the minimum as indicated in 
the recursion formula, The least average risk attainable from the 
first group is simply Be (5) , as previously defined. The least 
average risk attainable from the second group is more complicated, 
Since this group prescribes from 1 to m+] observations, we are 
certain to take at least one observation, This accounts for the c 
in the formula, After this one certain observation is taken, its value 


being a , it is possible to modify the a priori distribution &(p) in /L 


» ? 


to an a posteriori distribution 5 (p) in IL by the Bayes theorem of 
Appendix A, At this point we would want to proceed by using the a 
posteriori distribution 5 , since it "2 an improvement over the a 
priori distribution, Todo so, we would calculate the least average 
risk produced by decision functions prescribing from 0 to m more 


observations, that is, Pm (FE) as previously defined, This would 


c + Pm ‘Fa) 


for the least average risk attainable from our second group of deci- 


give us an expression. 


sion functions, The reasoning thus far has omitted one subtle, but 
key point. It is that the single observation a is never actually 
taken, Therefore, we must consider all possible values that a 
might take in a future observation, To do this we must regard the 
value a as a stochastic variable, and compute an expected value 
of fo (§.) with respect to the distribution of a . This accounts 
for the fact that the second choice on the right side of the formula 


takes the form 


@® 
rm x 
| Po (5) af (af) . 
Ma) 
Wald has shown that Pm ($) will, for a sufficiently large 


value of m , differ from PP (5) by an arbitrarily small amount, 


This permits us to write 


(B) lim (,, (5) = P(E) ; 
and leads us from formula (A) to the formula 


(3) 
(C) PIS) = Min Pe : 
‘c+ [prea (alé)| 
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This formula is presented in the notation of the Stieltjes Integral (see 
Appendix A), and does not distinguish between the case where the sto- 
chastic variable a is discrete and the case where it is €ontinuous, 


If we desired to do so we could write 


(C 


(5 
1) (ts) = Min ~ 


Spay (a [8) 


x 7 
where f is the bar graph of a discrete stochastic variable, and 


(C 


P's) = Min i i oO 


) 
F fi wrieieinae 


* / 
where f is the density function of a conte stochastic variable, 
The payoff of the preceding discussion lies in the manner in which 
PS) and Po (S) may be used to obtain a Bayes solution, It is best 


explained by Wald when he says: 
A Bayes solution relative to a given a priori probability 
measure a can immediately be given in terms of the 
fufictions Als) and Po (%) as follows: If Phe) = 
at (S ) , do - take any ate camer and make a 
final decision dv for which W (55°45 a ize (S,) et 
aa i) < ,(8,) , take an observation on x and 
compute the a posteriori probability measure 5 x, cortes- 
ponding to 5, and Xy If P (Sx) = hes (5x) , stop 
ee and = a final decision a’ for which 
W ( 5y,,a" ) = = (Sx,) ° if  (5x,) < Pro (5%) » take 
an observation X5 on xX, . In general, after the obser- 
vations Xpre ee se X have been made, take an ad- 
ditional observation if P(3x), ~~.) , xn) < 
Po (5x,, ues g xn? » and stop experimentation with 
a proper terminal decision iff P(Sx): oe» x) = 
P, (EX) 2. x) where Sx,,... x, denotes 


the a posteriori probability measure corresponding to 
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Die Stochastic Variables Limited to Two Values, 

The case where the x, are restricted to take only two values is 
quite special, It will arise when the value of each variable may be con- 
sidered to be a failure or a success, as in Exhibit A, In such cases, 
the values of the stochastic variables are takenas 0 and 1 . These 
correspond respectively to failure and success, The following short- 
hand notation is used to describe cumulative distribution functions, 


Bo : ana priori cumulative distribution function of P in /L 


5: : the a posteriori distribution of P in /L after i 0's 
and j 1's have been observed, : Soo is the same as So ~ 


If there exists a positive integer m such that 


L, 2e 6 omits 


Po \Fxnj ) < c and Po (5, ) < © fer i 


j= 1,2.5m, 
then it is clear from formula (C) that 


Ping) = Po \Fmj) 24 P (Fim) = Po Fim) for 


i= 1,2... 


j=1,2...m. 
This may be explained in words as follows: Suppose an integer m 
exists such that when either m 0's or m l's have been observed, 
and the attendant a posteriori distributions computed, it is found 
that the least average risk attainable, by allowing, from this point 
on, decision functions which prescribe no experimentation,does not 
exceed c . Then from formula (C) the least average risk attain- 
able by allowing decision functions prescribing any amount of exper- 


x Wald uses the term probability measure where we have been using 
cumulative distribution function, 36 








imentation is equal to that which is attainable by allowing only decision 
functions which call for no further experimentation, 

Let us now define eT to be the probability of obtaining the value 
1 ona single observation when 55 is the a priori distribution, 


That is, 


~~ = f (1 ds.. P 
Pi; f (1|p) 45,5 (p) 
*/ 
is the f of formula (C,) i 
Then the probability of obtaining the value 0 ona single trial is 
1 - Pi; . Using this notation, the formula (C,) of the preceding sec- 


tion may be adapted to the case where the stochastic variables take 


only two values, It becomes 


(5..) 


0 * 7ij 


(D) aaah = Min 
Cc + Pij Pl, ja ) + (1 ~ Pij (F413) © 
It is this particular form of the formula, along with the defining 


relation 
= s t = ° t 4 
(E) (6. = Mp w (5.4) = Min {ow (p,d’) 5 (p) dp 


given earlier and the Bayes theorem of Appendix A that we shall use 
in solving Exhibit A. The details of their use are best seen by study- 


ing the detailed solution of the problem, 


4. The Solution of Exhibit A, 

The dollar values given in the original presentation of Exhibit A 
in Chapter I may be multiplied by 107° without altering the procedure 
followed in solving the problem, This amounts to expressing all costs 


in millions of dollars, Making this simple transformation and convert- 


at 








ing each original ''precept'' of the problem into a technical datum, as 


subsequently introduced, we have given: 


(1) the stochastic process: x, = 0 (failure) or Xx. = 1 (success) 
ix) G(x) 
1 
f 
a 





o 
ba 


Distribution of X 
Figure 8. 
(2) the a priori distribution in the parameter space: 


=(p) Ep) 
1 





o 
is 


Distribution of P 
Figure 9, 
(3) the decision space: p’ consists of two elements: 


at accept the midget submarine 


1 
ds reject the midget submarine 
(4) the weight function: 

t 1 

WwW (p.d, ) = O™ior > 7 

=. 1] for p < z 

W (p.d, ) = 0 tor ® < z 

= 1 for p> - 
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O 25 1,00 ir O to 1,00 in 
Weight Function 


Figure 10, 
(5) the cost function: © = .004, the cost of a single experiment, 
The Bayes solution to this problem, that is, the Jd, that we 
seek, is atable, It is the same table that was given in Chapter I, 
The upper entries in the cells of the table are values of ie (S35 = 
while the lower entries are values of Pi; ). Hence the table 


provides the comparison of [> (5. ) and Pi ) needed to deter- 


mine how to experiment and reach a terminal decision, Our imme- 
diate task is to calculate these values of [o (555 ) and PF ) to 
complete the table. We may begin be calculating the values of 

Po (5. ) for successive diagonal entries (i=) spaiieccreie (E) 
and Figures 9 and 10. 


Po (Boo) 


W (50d, ) 


W (p.at ) & weil bs 
p.d,) S(p) dp = | (1) (1) dp + [(0)(1)dp 
(-) & 


Lu 
Yop l 
= | hj 4a ' 
/ It :' 
be) = Sw (p.d5 ) S(p) dp = [covcay af $ for 
: i, ; j 4. 
Ply 7 
] 
P is.) = Min w(_,d') = Min EB iliagitl 5 . 2500 
fe) rofe) q oo ; 4 
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The remaining diagonal entries of fro (5 ) are computed using the 
Bayes theorem (Appendix A, Case II) as well as formula (E) and Fig- 


ures 9 and 10 4 


p, (5,,) 


7 2 
/ = a 
ei = (UU -pilel > ile Te olen a oat 
{[(u-piplap pe | 
° 2 = Sale 
; Y4 3 / 
W (5,4, ) = [ anter - 6p") dp + fone - 6p) dp 
a 4 = 
2 3 5 
a ge aa 2 »?| = 37 
Yq 7 , 
W (51,45) = {(OM6p- 6p") dp + | (1)(6p - 6p" ) dp 
oo 0 % 
iF a 2p ~o 
) Op kre _ 
fo (5,,) = me w(3,,.4°) = Min + = _— = , 1563 
oe 


The procedure may be generalized for all diagonal entries (i= j) 


so that we have 


/4 Ys. — 
& p) (p 7 
fo a) —— . om 
Lo - p) (p) dp [ (1 - p)'(p) dp 
Oo ra) 


for all i. Values of this last expression may be obtained from 
Tables of the Incomplete Beta raucticn., - The use of these tables 
permits easy evaluation. Values obtained are the entries shown in 
the upper halves of the diagonal cells in Table 3, 

The next step is to calculate the non-diagonal (i #5) upper entries. 


This is done as follows: 


Po (F23 ) 


‘ Tables of the Incomplete Beta Function, Pearson, University Press, 
Cambridge. 1934. 40 








/ 2 


3 4 5 
te pores. BP) Pp = PtP EP = 60 (p® - 2p* tp) 


l 
“(a)(d - py” prdp [ep -2P + E] 


3 O 
Ya- : 14, 
et 3 * 5 oa 5 6 
wW (Bd) = | [60 - 2p" + Pp” )] ap -[15p* - 24p + 10p4], 
= ,0376 | 
{ 
/ 
W (5, eee ) = (1)[ 60(p> = 2B + a ) Jdp = [15p* - 24p7 + Lop 
Be Sf V4. Fig 
= .1700 


. 0376 


0376 
Po'¥23) = Min | 1700 


Again the procedure generalizes and we have, for i< j , 


Po (555) 


For i> j we have 


4 F ; 
{0 - pi py ap 


j/ i m 
{a See ese 


U 
Ly - p)’ (p)” dp 


Po (53; ) 


(vo - p)* (p)) dp 
Oo 
As before, the evaluation may be accomplished by use of the Tables of 
the Incomplete Beta Function, Note that Po (53 ‘ge Po (555 — 
This makes it necessary to evaluate entries on only one side of the 
main diagonal, since the remaining entries may be determined by 
symmetry. 

With the upper entries filled in, we turn our attention to the lower 
entries, They may be determined in two stages. The first stage is * 


just: tor compare Po (55; ) with c for each square. Since 
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Po ‘5 


(D) A(S..) = Min 
p = ctp., (5; 341) + C1 > Bibl cay). 


we may immediately select les (55 ) as the value se (555 ) for all 

squares in which Po (g ij ) <¢ c . For those squares in ges 8 (5; ) 
> c we must use formula (D) and the formula for Pi; to calculate 
p(s; ; ). For example, in the case of diagonal entries where P.. = . ; 


we may compute 


fo (S99 
c+ Pog P'%o,10) * (1 -Pgg) Pl 10,9) 
_ Min{ +2090 - 0090 
004 + 5 (.0039) + > (.0039)] = Min |.0079 
= ,0079 


In the case of non-diagonal entries, the first step is to compute Pi; 
from the formula 


/ 

Pi; = [: (I|p)ad, = if (1|p) 55; (p) dp. 
2 St 

Upon substituting in this formula we have 


/ a ©, 
[o-pr (p)Jt! ap 


Pp... = (p) (1 - ) ( Ys : dp =% ————_____ 
‘ ‘a-p)* (py '1-p)! (p) | 
L, fap) (p)” dp fa-p) (p)" dp 


This last expression may be evaluated using the Tables of the Incom- 

plete Beta Function, Once et is known, we have only to solve formu- 

la (D) for (53; ) . Vaitttes of fe (S.. ) obtained in this way complete 
1J 1J 

Table 3. 
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The best sequence for calculating the lower entries is as follows: 
Fill in the main diagonal entry in the lower right hand corner first; . 
Then progress to the left in that row. Next, move up to the next higher 
diagonal entry and again work left on the row, The entries on the upper 
right hand side of the diagonal can be filled in by symmetry. 

The interpretation of the table, as given in Chapter I, may now 
be stated in terms of the technical notation. Begin taking observa- 
tions and after each observation compare mast ) with re (5.5 ae 
As long as ie (5:5 ) is less than Po (5; ) » continue taking observa- 
tions. When an observation is made such that Pi; _ Po (55; i 
stop experimentation and make a proper terminal decision, If i>j, 
the terminal decision will be to reject the midget submarine. If 


i= jy the terminal decision will be to accept the midget submarine, 
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CHAPTER V 


THE MINIMAX SOLUTION 


L. The Minimax Solution and its Relation to the Bayes Solution, 

The scope of this paper, for detailed discussion, is limited to the 
Bayes solution of the statistical decision problem. Emphasis is given 
to the special case in which the xX. are independently and identically 
distributed, and confined to take only two values, However, to avoid 
having the reader assume that this constitutes all of statistical de- 
cision theory, mention should be made of the Minimax solution, 

It was pointed out in Chapter II that the Bayes solution is 
always given relative to an a priori distribution of the unknown para- 
meter, If such a distribution cannot be given, it may still be possi- 
ble to solve a statistical decision problem. A solution may be ob- 
tained by viewing the decision problem as a zero sum, two person 
game, and solving the game. A solution obtained in this manner is 
termed a Minimax solution, A Minimax solution may also be obtain- 
ed in other ways. A Minimax solution, as noted in Chapter III, isa 
particular Bayes solution, Specifically, it is that Bayes solution 
which is given relative to the least favorable a priori distribution of 
the unknown parameter. 

The difference between the Bayes solution and the Minimax 
solution lies in the choice of a yardstick for comparing the relative 
merits of the various decision functions, The basic criterion,in 
either case, is the risk function, But the modification of this cri- 


terion,to arrive at the final yardstick, is different, The reader will 
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recall from Chapter II that, for a Bayes solution, the risk function, 
r(p,d) , was modified to an expected risk, r(S,d), by averaging out 
the p , and this expected risk constituted the final yardstick. The 
modification was accomplished by using the a priori distribution, S(p). 
The expected risk, which could then be ordered as to magnitude where 
the magnitude depended on d alone, permitted the selection of the 
particular statistical decision function, o. , that provided the least 
expected risk and hence ne optimum solution relative to the assumed 
S (p). Inthe case of a Minimax solution, we consider that ana 
priori distribution is not available. Hence, the procedure employed 
to modify the risk function to a suitable final yardstick must be alter- 
ed, The procedure that is used consists of taking the maximum risk 
vice the expected risk, Ana priori distribution of P is not required 
to do this, We simply take the maximum value of the risk, r (p,d); 


foreach oO , by selecting the p that maximizes it, That is, 
Max risk = Max r(p.d) . 
Petr 


This new function, the maximum risk, is dependent on JS alone, and 
can therefore be ordered as to magnitude with the magnitude deter- 
mined by Jd. Again we select the particular oF that minimizes 


the yardstick, That is, we take 


e — r(p,d) for all J. 


This is sometimes written 


Max r (p. J.) < Max r(p.d) forall oO. 


The Jd, of this latter expression constitutes a Minimax solution, 


The statement that the Minimax solution is a Bayes solution 
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relative to the least favorable a priori distribution now seems reasonable, 
For, if our a priori distribution for a Bayes solution were the least favor- 
able of all, it would lead us to the Me* Eior d) asa yardstick, 

The procedure used to arrive at a Minimax solution is summarized 
in Figure 11 (see following page) . 
oa Relation to the Theory of Games. 

The reader familiar with von Neumann's theory of games will recog- 
nize the procedure of the preceding section as essentially the same as 
that of game theory. Infact, Wald points out the detailed correspon- 
dence between the statistical decision problem in which no a priori dis- 
tribution is given and the zero sum, two person game, In the general 
case, the corresponding game is a continuous one, This means that the 
question of the strict determinateness of the game must be investigated, 
Whereas the fundamental theorem of rectangular games assures the ex- 
istence of a solution to any finite game, no such assurance exists in the 
case of all infinite games. However, Wald demonstrates that, under 
suitable assumptions, any statistical decision problem viewed as a con- 
tinuous game may be approximated arbitrarily closely by a finite game. 
This means that, even if the continuous game is not strictly determined, 
no practical limitation is imposed. The detailed procedure employed in 
arriving at a Minimax solution of a statistical decision problem in this 
manner involves the formulation of the problem as a game, and the 


solution of the game, It will not be covered here. 


a 


5. Summary. 
When an operations analyst is confronted with the need to make a 


decision on the basis of the results of conducted trials, the accuracy 
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of which depends upon the true value of an unknown parameter, and the 
cost of the experimentation required to estimate the value of the para- 
meter is significant, a statistical decision problem is indicated. As 


Wald puts it, in two sentences here taken out of context, 


t 


A statistical decision problem is formulated with reference 

to a stochastic process. ..A statistical decision problem 

with reference to a stochastic process X arises only when 

the distribution F (x) is not completely known, 
Once a statistical decision problem has arisen, it must be possible to 
specify the stochastic process, the parameter space, the space of 
terminal decisions,the weight function and the cost function, in order 
to solve it, A solution consists of determining the particular statis- 
tical decision function that prescribes the optimum plan for conduct- 
ing experimentation and reaching a terminal decision, 

The procedure employed to reach a solution involves the use of 
a risk function as a basic criterion for selection the optimum decision 
function, This risk function takes account of both the cost of wrong 
decision and the cost of experimentation, If ana priori distribution 
of the unknown parameter can be given, the final yardstick for select- 
ing the optimum decision function is the average risk; if not, the 
final yardstick is the maximum risk. In either case, the yardstick 
is ordered as to magnitude, and that decision function which pro- 
vides the least value of the yardstick is selected as a solution, 
The first case yields a Bayes solution; the second a Minimax solu- 
tion, 


The consequence of the final decision is probabilistic. This 


ay 





of which depends upon the true value of an unknown parameter, and the 
cost of the experimentation required to estimate the value of the para- 
meter is significant, a statistical decision problem is indicated. As 


Wald puts it, in two sentences here taken out of context, 


{ 


A statistical decision problem is formulated with reference 

to a stochastic process. ..A statistical decision problem 

with reference to a stochastic process X arises only when 

the distribution F (x) is not completely known, 
Once a statistical decision problem has arisen, it must be possible to 
specify the stochastic process, the parameter space, the space of 
terminal decisions,the weight function and the cost function, in order 
to solve it, A solution consists of determining the particular statis- 
tical decision function that prescribes the optimum plan for conduct- 
ing experimentation and reaching a terminal decision, 

The procedure employed to reach a solution involves the use of 
a risk function as a basic criterion for selection the optimum decision 
function, This risk function takes account of both the cost of wrong 
decision and the cost of experimentation, If ana priori distribution 
of the unknown parameter can be given, the final yardstick for select- 
ing the optimum decision function is the average risk; if not, the 
final yardstick is the maximum risk, In either case, the yardstick 
is ordered as to magnitude, and that decision function which pro- 
vides the least value of the yardstick is selected as a solution, 
The first case yields a Bayes solution; the second a Minimax solu- 
tion, 


The consequence of the final decision is probabilistic, This 


a 





means that the final decision may, ina particular instance, conceivably 
be a poor one, Nonetheless, the theory offers a rational approach to 


the type of problem it fits, and is superior to any other known approach, 
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APPENDIX A 


SOME SELECTED MATHEMATICAL CONCEPTS 


:. Probability. 

Probability is a quantitative measure of the likelihood of the 
occurrence of events. It is expressed by assigning a number. in the 
range (0,1). to any specific event, For example, if an event is 
certain’ to’occur it has probability 1 ; if it is certain not to occur 
it has probability 0 . If an event has a fifty-fifty chance of accurihnig 
it has probability > . The probability of an event may be estimated 


by conducting repeated trials and employing the formula 


number of successes 


Bpotevilly. - gaemiber oftriais 


4. Stochastic Variables. 

A stochastic variable may be defined to be a function which asso- 
ciates a real number with every possible outcome of an experiment, 
The outcome of any particular performance of the experiment is said 
to be a value assumed by the stochastic variable, it being understood 
that this outcome is a chance occurrence, A stochastic variable is 
termed discrete if the number of distinct values which it may assume 
is either finite or may be arranged in a sequence (i.e., is denumerable). 
It is termed continuous if its possible values may be represented by an 
interval on the real line, e.g., all the points x such that a < x < b 


or -©O< x <@, 


3. The Distribution of a Discrete Stochastic Variable. 


The correspondence between the values of a discrete stochastic var- 


iable 
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and the probabilities that it will take on these values may be described 
either by a probability function (bar graph) or by a cumulative probability 
. distribution function (step function). As an example of this, consider a 
single true die to be tossed a large number of times. A mathematical 
description of the stochastic nature of this experiment may be formu- 
lated as follows: 


mA a e e 
_X: a stochastic variable representing the value shown 


on the die after any throw. 


wes real values which may be assumed by the stochastic 
variable X, i.e., 1,2,3,4,5, and 6, 


G(x): the probability that X will take on a value less than 
or equalto x . G(x) = Pr (X < x). 


ae 


g(x): the probability that X will take onthe value x , 
g(x) = Pr (X= x). 


These quantities may be displayed as follows: 
gH) Gt&) 
{ 


LL 
6 





o'/234 5 6 Oo 234 SF 6 
Bar Graph Step Function 


Distribution of A Discrete Stochastic Variable 
Figure 12, 
The bar graph indicates that the probability of tossing any pixticulae 
number on a given throw is the same for all numbers, and is equal to 


z . The step function is an alternative way of presenting essentially 
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the same information. It permits the probability that a toss will show a 
value less than or equal to any given value to be read directly. For in- 
stance, the probability that the die will show three of less on a throw is 


Gis) == > , 


a result that would be anticipated. It is to be noted that 


». 
iNe 


g(x.) = 1 and g(x, ) = 0 iis eae. 


Also, 

G(0) = O- and G(o) = 1. 
These are fundamental relations associated with the probability func- 
-tion and cumulative probability distribution function of the stochastic 


variable X , 


4, The Distribution of a Continuous Stochastic Variable. 

The correspondence between the values of a continuous stochas- 
tic variable and the probabilities that it will take on these values may 
be described either by a probability density function or by a cumula- 
tive probability distribution function, As an example of this, consider 
a line six units long on which a point is to be chosen at panier This 
is an experiment similar to the one used to describe the distribution 
- of a discrete stochastic variable, but now the value of the stochastic 
variable may be any number in the closed interval [0,6]. A mathe- 
matical description of the stochastic nature of this experiment may be 
formulated as follows: 


X: a stochastic variable representing the coordinate 


point selected on any try 


Dap real values which the stochastic variable X may assume. 


54 





G (x): the probability that X will take on a value less than 
or equalto x . G(x) = Pr (X<-x). 


g (x): the probability density function of X. 
g (x) dx: the probability that X will take on a value between 
= x and x¢+dx. g(x)dx = Pr(x< X<x+dx). 
The probability density function and the cumulative probability distri- 
bution function associated with X may be displayed as follows: 


3X) Gt) 
Z es 


a — SE —at = 





¢ 
Probability Density Function Cumulative Distribution Function 


Distribution of a Continuous Stochastic Variable 
Figure 13, 

The particular density function of Figure 13, is said to be uniform. 
This means that the stochastic variable is equally likely to take on 
amry one of its values and accounts for the straight, horizontal line 
which represents the density function. Other stochastic variables 
may have a bias such that some of the values are more likely to occur 
than others, and will have density functions which are ae represented 
by horizontal lines. In any case, the area under the density function 
witil always be 1 , and the cumulative distribution function will in- 
crease monotonically to a maximum value of 1 for increasing values 
of x . It is to be noted that 


6 
[ete ax = 1 and g(x)>0 for 0<x<6. 
0 
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Also, 
G (0) = 0 and GAG) =e 


These are fundamental relations associated with the probability density 
function and the cumulative probability distribution function of the sto- 
chastic variable X . In texts on probability theory. it is shown that, 
for continuous stochastic eee the density function, when it 
exists, is the derivative of the cumulative distribution function, 

This is a basic relation. It should be noted that, whereas every sto- 
chastic variable, X , has a cumulative distribution function G (x) ,° 
the density function 


~ _dG (x) 
g ( x) - 


exists only if G(x) is differentiable. 


a The Expected Value of a Stochastic Variable. 
The expected value (average value) of a discrete stochastic vari- 
able is defined to be 


(5.1) .B(X)= = 2 x glx). 


aha 


and the expected value of a continuous stochastic variable which has 
a density function is defined to be 
(5. 2} BE (a = x = xro(x) dx . 


alex 


The expectation, E(X) , is often termed a weighted average. Inthe 
case of a discrete stochastic variable, the ''weight'' associated with 


x. is g (x, ) . Inthe case of a continuous stochastic variable, the 


"weight'' associated with x is g (x) dx, i.e., the probability that 


x lies in dx . Inthe latter case we say, more briefly, that x is 


weighted by g (x), the value of the probability density function, 
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In theoretical discussions. it may not be desirable to distinguish 
between discrete and continuous stochastic variables nor to emphasize 
continuous stochastic variables which have a density function, In such 
circumstances, generally, it is only necessary to refer to the stochas- 
tic variable, say X , and its cumulative probability distribution func- 
tion, say G(x) . To represent the expected value of X, it is custom- 


ary to write 


. OD 
(5. 3) E (X) = Jaa (x) , where 


the integral on the right represents either a Riemann-Stieltjes ora 
Lebesgue -Stieltjes integral according to the degree of generality of the 
theory of integration under consideration, In this paper, such integrals 
will be viewed as Riemann-Stieltjes integrals. In short, we may regard 
the integral of equation (5.3) as a concept which includes both of the 
concepts of equations (5.1) and (5.2) as special cases. We should note 
that the expected value of a function of a stochastic variable may also 

be defined. For example, if h (X) is a function of the stochastjc 


variable X , we may write 


c 
E [h (x)| : [ 2 ee i). 


& 


As a further illustration of the notion of expectation and the use of gen- 
eral functional notation, consider the following: 
rs a stochastic variable 
p: real values that may be assumed by FP 
S(p): cumulative probability distribution function of Pp 


' 
S(p): probability density function of P , 
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The expected value of P , according to equation (5.2), is 
- / 
E(P) = p= |p 5 (p) dp 


This may be more conveniently expressed, using the Riemann-Stieltjes 


B= frag, 
“S\.. 


where -fL is the space of p , and the differential dS is used instead 


integral, as 


/ . 
of Sdp. Thus, values of p are weighted by d&5, where formerly 


/ 
values of p were weighted by §S(p)dp . The symbol 


[ras 


is a functional symbol used to express the notion of the weighting and 
summing. When actual computations are carried out, d S(p) is 
replaced by its equivalent expression in p _ , and the integration is 


carried out just as in elementary calculus. 


6. Joint Distribution Functions. 

So far we have discussed only distribution functions of a single 
stochastic variable X . The notion of joint distribution functions of 
more than one stochastic variable is often employed in probability 
theory. This is nothing more than an extension of the idea of the 
fae ibution function of a giupte stochastic variable. For example, 
if X, and X 


1 
lative distribution function, F (xX) »X5 ) , is the probability that 


> are two stochastic variables, then their joint cumu- 


x < x) and xX, < Xo simultaneously. That is 


F (x) 5X5) = Pr (xX) < x, and xX, < x5 ) simultaneously. 
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The density function of such a distribution may be represented as a sur- 


face in three dimensional space as follows: 


F(x, )X2) 





X2” Joint Density Function 
Figure 14, 
It is to be noted that 
f(s (x) »X>) dx, dx, = | and f (x, +x>) >0 for all Xp» Xo 


Also, 
. 2 
3 F (x,,x,) 


oe) es 


These are fundamental relations associated with the joint distribution, 
In a similar manner, the analytical notion of a joint distribution func- 
tion may be extended to any number of stochastic variables, although 


the geometrical representation does not apply for more than two. 


te Bayes Theorem, 

Perhaps the single mathematical concept most vital to an under- 
standing of statistical decision theory is the Bayes theorem of inverse 
probability. To explain this ie wees in terms of the example of 
Chapter I, let us recall that we assumed that P , the true percen- 


tage success of the midget submarine in a future war, is equally 
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likely to have any value between 0 and 100%. This is equivalent to 
assuming that the a priori distribution of the parameter is uniform and, 
at the outset, represents our best knowledge of P . As the problem 
progresses, afi: waatione are made. These observations add to our 
knowledge of P , and we therefore wish to modify the originally as- 
sumed a priori distribution of P to what we term ana posteriori dis- 
tribution of P on the basis of the observations, Bayes theorem pro- 
vides the means to do this. Thatis, if an a priori distribution is 
known and observations are subsequently made.,’ Bayes theorem may 
be used to modify the a priori distribution to an a posteriori distribu- 
tion on the basis of the observations. Two forms of Bayes theorem 

in its application to density functions in statistical decision theory are: 


Case I (SU Discrete): 
a gad 


eo ae 
/ S. wy 2, Be Th) 
r j i= 1] i 
me; (p) = ler ~ oe 
Fey Sip) [2,8 (1) 
Case II (+ Continuous) : 
, oon 
ely s Str) T, z t&,|p) 
S(p) 32, 8 (x, |p) dp 


AL 
In these formulas, g (x, | 5) is a probability function when X is 
‘discrete and a probability density function when X is continuous, 
and the integer n is the number of values P may assume in Case I, 
To examine the theorem further, let us study an example which 


finds application in this paper. Suppose 
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x. (i=1,2,...): a collection of independently and identically distributed 


discrete stochastic variables 


iat (i=1,2,,...): real values that may be assumed by each X; . Each 


x. is confined to the two values 0 or 1, 


G (x): the common cumulative probability distribution 
(step) function applicable to each of the X, . 

g (x): the common probability function (bar graph) appli- 
cable to each of the x, : 


RP: a continuous stochastic variable representing the 


parameter of G(x) or g (x) . 


p: real values that may be assumed by F , i.e. 
O<psl. 
5 (p): the a priori cumulative probability distribution 
function of P 
/ 
S(p): the a priori probability density function of P , 
mo(P)! the a posteriori cumulative distribution function of 


P after m observations on the x, : 


/ 
mo (P)! the a posteriori density function of P , after m 


observations on the xX; 


The Bayes formula for the density functions of P is, as in Case II, 


ae 
Sip og Ga (p) 


‘iz 


/ 
mo(P) B=" 
/ we 
S(p) JT, g (x, |p) dp 
af 
where /L is the space of P . Let us amplify this with some diagrams 


and sample computations. Suppose each x, is distributed according 


to the following diagram: 
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gtx) Gtx) 


1 vi 
1-p I-p f 
P 
O 1 ‘ O Z . 


Distribution of the X 
Figure 15, 
The letter p stands for a value of the unknown parameter. If we 
assume the a priori density function of P to be uniform, it may be 
pictured as follows: 


f 
) 
£6 


O r 


A Priori Density Function of P 
Figure 16, 
If we take a single observation on one of the xX. with the result 


x, = 1 , we may apply Bayes formula as follows: 


/ 
Ze = te = 2p. 
[Q) pap 


This a posteriori density function may be pictures as follows: 
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1 r 
A Posteriori Density Function of P for x, =1. 

Figure 17. 
Note that the result of the single observation, through the Bayes formu- 
la, has modified the density function of P from uniform to a bias in 
favor of the value 1 . This is an intuitively reasonable result, since 
the value x, = 1 was observed. Similarly, if the result of the 
single pbceteatio’ had been x, = 0, the a posteriori density function 
would have been modified from uniform to a bias in favor of the value 
0. In that case, 


/ 
,5(p) = Ss Se 2 -mepe, 
i f/-plap 


and the a posteriori density function becomes the following: 


13 (7) 





O I ir 


A Posteriori Density Function of P for x, = 0 


Figure 18. 
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Again, if two observations had been taken with the results x) = 1 and 
X5 = 0, then the a posteriori density function would be 


/ 
S (p) - (1)( )(1- ) : = 6p = owat 
: £2 apn-p) ap 


and is pictured as follows: 
! 
ef 
21 -- 


0 m@ | i 
A Posteriori Density Function of P for x, = lL; Xo = Om. 
Figure 19. 
Note that this last density function is a parabola, and has been modified 


from uniform to a bias in favor of the value > . This is again an in- 


tuitively reasonable result to follow from the two observations. 
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